Visualization in R

A SC3L Workshop

Tyler Wiederich

University of Nebraska-Lincoln

About us

  • We are the Statistical Cross-disciplinary Collaboration & Consulting Lab (SC3L) from the UNL Department of Statistics.
  • We offer free statistical consulting services to students, faculty, and staff at UNL.
  • Workshops! Hosted from 1-2pm on Wednesdays and Thursdays.

Data Visualization in R

Why visualize data?

Data visualization is an important step in understanding the relationships of variables in your dataset.

  • How does Factor A affect my response?
  • Is there an interaction between Factor A and Factor B?
  • Are there outliers in my dataset?

The basics

R has multiple methods of creating visualizations, but our focus will be with the ggplot2 package. This package uses the Grammar of Graphics approach, layering different building blocks to produce a graph.

The basics: data format

Data needs to be formatted so that it is tidy, which is defined as one observation per row and each measurement as a column.

Trt1 Trt2 Rep response
1 1 1 9.37
2 1 1 9.04
1 2 1 10.84
2 2 1 10.94
1 1 2 10.00
2 1 2 9.63
1 2 2 9.62
2 2 2 10.00

Example: not tidy

Average diamond price by cut and color
color Fair Good Very Good Premium Ideal
E 11156 4535 1703 4739 1799
G 5924 NA 2684 5720 3266
H 3862 NA 4527 5438 3448
D NA 2406 3424 1809 5855
F NA 4549 8948 2880 1780
I NA 1952 4532 4480 3734
J NA NA NA 6348 1720

Example: tidy

Average diamond price by cut and color.
cut color price
Fair E 11156
Fair G 5924
Fair H 3862
Good D 2406
Good E 4535
Good F 4549

The basics: syntax

ggplot(data = data, mapping = aes(...)) + 
  geom_FUNCTION(aes(...), ...) + 
  scale_FUNCTION(...) +
  facet_FUNCTION(...) + 
  labs(title = '', subtitle = '', x = '', y = '') + 
  theme_FUNCTION(...) + 
  coord_FUNCTION(...)

Preliminaries

install.packages(c('ggplot2', 'palmerpenguins', 'ggthemes'))
library(ggplot2)
library(palmerpenguins)
library(ggthemes)

Example 1

Example 1: penguins

Palmer Penguins Data
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007

Example 1: penguins

ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm, color = species)) + 
  geom_point(aes(shape = species), alpha = 2/3, size = 1) + 
  theme_bw() + 
  labs(x = 'Bill length (mm)', y = 'Bill depth (mm)',
       title = 'Bill length vs. Bill depth', subtitle = 'By species',
       color = 'Species', shape = 'Species', caption = 'Source: palmerpenguins') +
  facet_grid(.~island) + 
  theme(aspect.ratio = 1/2,
        legend.position = 'bottom')

Your turn!

Using the penguins dataset, answer the following research question?

  • Are bill lengths and bill depths different by sex of each penguin species?
  • Hint: consider the use of color and facets.

Solution

ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm, color = sex)) + 
  geom_point(aes(shape = sex), alpha = 2/3) +
  facet_grid(.~species) + 
  labs(x = 'Bill length (mm)', y = 'Bill depth (mm)',
     title = 'Bill length vs. Bill depth', subtitle = 'By sex',
     color = 'Sex', shape = 'Sex', caption = 'Source: palmerpenguins package') +
  theme_bw() + 
  theme(aspect.ratio = 1/2)

Example 2

Example 2

ggplot(data = economics, mapping = aes(x = date, y = uempmed)) + 
  # geom_line(color = 'black') + 
  geom_area(fill = 'skyblue', color = 'black') +
  theme_bw() + 
  labs(x = '',
       y = 'Median durration of unemployment\n(in weeks)',
       title = 'Longer unemployment during Great Recession',
       subtitle = 'in the United States',
       caption = 'Source: ggplot2::economics') + 
  scale_x_date(date_breaks = '5 years', date_labels = '%Y') + 
  scale_y_continuous(limits = c(0,27), expand = c(0,0)) + 
  theme(aspect.ratio = 1/2)

Your turn!

Use the economics2000s dataset created below from the economics dataset. Create a visualization using geom_bar() to plot the average unemployment rate for each year. Additionally, use theme() to further customize your plot!

Hints

  • geom_bar() requires the argument stat=identity when using both x and y in the aes() function.

  • Use ?theme() or visit (https://ggplot2.tidyverse.org/reference/theme.html)[https://ggplot2.tidyverse.org/reference/theme.html] to see available options

library(dplyr)
library(lubridate)
economics2000s <- economics %>% 
  mutate(year = year(date)) %>% 
  filter(year >= 2000) %>% 
  group_by(year) %>% 
  summarise(mean_unemploy = mean(100*unemploy/pop))

Solution

economics2000s %>% 
  ggplot(mapping = aes(x = year, y = mean_unemploy)) + 
  geom_bar(stat = 'identity', width = 1,
           color = 'black', fill = 'skyblue') + 
  labs(x = '', y = 'Unemployment rate (%)', title = 'Unemployment rate in the United States',
       subtitle = '2000 to 2015') +
  scale_y_continuous(limits = c(0, 5), expand = c(0,0)) + 
  scale_x_continuous(breaks = 2000:2015) +
  theme_bw() +
  theme(aspect.ratio = 1/2, 
        panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(),
        plot.background = element_rect(color = 'black', fill = 'grey80'),
        plot.title = element_text(size = 16, family = 'times', face = 'bold', hjust = 0.5),
        plot.subtitle = element_text(size = 12, family = 'times', hjust = 0.5))

Saving your visualization

Saving your visualization

myplot <- economics2000s %>% 
  ggplot(mapping = aes(x = year, y = mean_unemploy)) + 
  geom_bar(stat = 'identity', width = 1,
           color = 'black', fill = 'skyblue') + 
  labs(x = '', y = 'Unemployment rate (%)', title = 'Unemployment rate in the United States') +
  scale_y_continuous(limits = c(0, 5), expand = c(0,0)) + 
  scale_x_continuous(breaks = 2000:2015) +
  theme_bw() + theme(aspect.ratio = 1/2)

ggsave('unemploy.png', width = 6, dpi = 600)

Your turn!

Practice using the skills you learned on a dataset of your choice. If you do not have one, use the diamonds dataset from the ggplot2

Wrap-up

Thank you

Additional resources

Visit our website to schedule an appointment! https://statistics.unl.edu/sc3lhelp-desk/